Whole-genome sequence analysis reveals selection signatures for important economic traits in Xiang pigs

Xiang pig (XP) is one of the best-known indigenous pig breeds in China, which is characterized by its small body size, strong disease resistance, high adaptability, favorite meat quality, small litter sizes, and early sexual maturity. However, the genomic evidence that links these unique traits of XP is still poorly understood. To identify the genomic signatures of selection in XP, we performed whole-genome resequencing on 25 unrelated individual XPs. We obtained 876.70 Gb of raw data from the genomic libraries. The LD analysis showed that the lowest level of linkage disequilibrium was observed in Xiang pig. Comparative genomic analysis between XPs and other breeds including Tibetan, Meishan, Duroc and Landrace revealed 3062, 1228, 907 and 1519 selected regions, respectively. The genes identified in selected regions of XPs were associated with growth and development processes (IGF1R, PROP1, TBX19, STAC3, RLF, SELENOM, MSTN), immunity and disease resistance (ZCCHC2, SERPINB2, ADGRE5, CYP7B1, STAT6, IL2, CD80, RHBDD3, PIK3IP1), environmental adaptation (NR2E1, SERPINB8, SERPINB10, SLC26A7, MYO1A, SDR9C7, UVSSA, EXPH5, VEGFC, PDE1A), reproduction (CCNB2, TRPM6, EYA3, CYP7B1, LIMK2, RSPO1, ADAM32, SPAG16), meat quality traits (DECR1, EWSR1), and early sexual maturity (TAC3). Through the absolute allele frequency difference (ΔAF) analysis, we explored two population-specific missense mutations occurred in NR6A1 and LTBP2 genes, which well explained that the vertebrae numbers of Xiang pigs were less than that of the European pig breeds. Our results indicated that Xiang pigs were less affected by artificial selection than the European and Meishan pig breeds. The selected candidate genes were mainly involved in growth and development, disease resistance, reproduction, meat quality, and early sexual maturity. This study provided a list of functional candidate genes, as well as a number of genetic variants, which would provide insight into the molecular basis for the unique traits of Xiang pig.

Identification of heritable variants. In total, 21,885,854 SNPs and 5,761,765 INDELs were obtained from Xiang pig genome (Table 1). Figure 1a showed the densities of SNPs and INDELs on each chromosome of XPs. We found that the maximum density of SNPs and INDELs was located on chromosomes 10 and 7, respectively. Of the total SNPs in XPs, 18,375,288 SNPs (83.96%) were identified in the dbSNP database, and 3,510,566 (16.04%) SNPs were newly found. Compared with other four pig breeds, 3,985,444 variants in our data set are Xiang-specific SNPs (Fig. S1).
Population genetic structure and linkage disequilibrium. To assess the genetic structure among the pig breeds in this study, principal component analysis (PCA), neighbor-joining (NJ) trees, and ADMIXTURE were performed. The analysis of PCA (Fig. 2a) revealed that the whole pig population was divided into five groups: XP, TT, MS, DU and LW. The first component (PC1) separated the European-originated pig (DU and LW) from Chinese pig (XP, TT and MS). The second component separated XP and TT from MS. Meanwhile, DU and LW were divided by PC2. The branches of the phylogenetic tree ( Fig. 2b) were consistent with the results of PCA, and successfully divided into five populations displaying genetically distinct clusters. Further population admixture analysis (Fig. 2c) clearly distinguished two breeds as expected based on their origin at K = 2. At K = 3, the European pig populations were separated, whereas the Chinese populations were still clustered together. At K = 4, MS were separated from Chinese pig populations. A separate ancestry for all four populations was visible at K = 5. With respect to the LD comparison among five populations (Fig. 3), we observed similar LD decay rates in Meishan and Duroc populations and the lowest level of LD in TT breed. XP had a lower level of LD compared to MS, DU and LW (Fig. 3).

Identification of selection signatures in XPs.
In order to detect the genome-wide selection signatures, we calculated pairwise Fst and θπ in 100-kb sliding windows with a step size of 10 kb across the autosomes    (Fig. 6a). Moreover, XP had different haplotype patterns with other four pig breeds (Fig. 6b), suggesting a strong selection for XP in this region. In addition, one gene on SSC4, five genes on SSC5 and one gene on SSC15 were selected for further research based on their known biological function, including SLC26A7, SDR9C7, RDH16, TAC3, MYO1A, LYZ, PDE1A. We found that the haplotype patterns of the SDR9C7 and RDH16 genes in XPs were different from that of the other four populations (Fig. 7), which suggested a strong selection for XP in these two genes. To analyze the signs of selection in detail, we detected the SNPs with highly significant effects, which located in exons, UTRs and downstream/upstream of the seven genes mentioned above. We found 36 SNPs with different mutation frequencies between the XP and the other four pig breeds in SDR9C7, RDH16, TAC3, MYO1A, LYZ, and PDE1A (Table S5). The genotype pattern of these SNPs showed significant differences among the five pig breeds (Fig. S2). We also noted that twelve of these 36 variants were nonsynonymous mutations (Table S6). Of note, a novel missense variant/splice variant (c.G895A; chr5:22,354,739) in frequency of the derived allele "A" was 0.98 in XPs (Table S6). In TAC3, one nonsynonymous variants (G > A; p.R84C; chr5:22,329,285) was predicted as functional-altering variant (SIFT = 0) ( Table S6). The p.R84C variant in XPs was significantly different from MS, DU and LW populations (Fig. 8).
Next, we evaluated the overlap between the candidate genes in Fst-θπ analysis and those detected genes in the absolute allele frequency difference (ΔAF) analysis in each comparison. In total, we obtained 174, 132, 186, and 308 genes overlapped among the results from Fst-θπ and ΔAF analysis in the TT vs XP, MS vs XP, DU vs XP, and LW vs XP comparisons, respectively (Table S11). Many of these overlapped genes have already been mentioned above, such as ZCCHC2 and TBX19. Especially the ZCCHC2 gene, it was commonly identified by Fst-θπ and ΔAF analysis in all four comparisons. These results supported by Fst, θπ and ΔAF methods further indicated the reliability of the highlighted genes in this study.  www.nature.com/scientificreports/ Considering the impact of nonsynonymous SNPs (nsSNPs) on protein function, we further investigated the genes detected only by ΔAF analysis that contain missense variants. In total, we explored 143 missense variants with marked allele frequency differences distributed in 89 functional genes that might affect domestication or phenotypic traits of XPs among all four comparisons (Table S12). For example, we identified two missense mutations with the highest ΔAF (ΔAF = 1) occurring in NR6A1 and LTBP2 in both DU vs XP and LW vs XP comparisons. The first was a c.T485C (p.L162P, rs326780270) substitution in NR6A1 (nuclear receptor subfamily 6 group A member 1), which encodes an orphan nuclear receptor and has been considered a strong candidate for affecting the numbers of vertebrae in swine 42 . The other was a c.T2732C (p.M911T, rs328662847) substitution in LTBP2 (latent transforming growth factor beta binding protein 2) that are associated with thoracic vertebrae numbers in the pig population 43 . Our results indicated that both of the two genes might play important roles in the vertebrae development of Xiang pigs. Moreover, we detected other important missense variants with marked allele frequency differences occurring in the functional genes. All of the strongly differentiated missense variants identified from the four comparisons were presented in Table S12.
GO terms and KEGG pathway enrichment analyses. Subsequently, we searched for significantly overrepresented (P-value < 0.05) GO terms and KEGG pathways related to the candidate genes with gene symbols identified from both Fst and θπ statistics in each comparison. In the TT vs XP comparison, 427 candidate genes under selection were used for the GO and KEGG analysis, which resulted in 224 significant GO terms and 18 statistically significant KEGG terms. The top three most significant KEGG terms included cellular senescence, oocyte meiosis, and progesterone-mediated oocyte maturation terms (Fig. 10a). Among the 224 significant GO categories, we detected the terms involved in adaption, including "positive regulation of bone resorption", "locomotory behavior", "actin cytoskeleton", "skin development", "sensory perception of sound", Table 2. A partial list of candidate genes with previous evidences that were shared among two or three comparisons datasets. www.nature.com/scientificreports/ "actin cytoskeleton", "actin cytoskeleton reorganization" (Table S13-1). In the MS vs XP comparison, 199 selected genes resulted in 187 significantly enriched GO terms (Table S13-2) and 12 KEGG pathways (Fig. 10b). Among the significantly enriched GO categories, the highest counts were associated with the immune system. The 12 KEGG enrichment pathways were significantly enriched in pathways involving cancer, bacterial invasion of epithelial cells, Fanconi anemia pathway, calcium signaling pathway, and other important biological processes. For the comparison between XPs and DUs, 180 selected genes resulted in 161 significant GO terms (Table S13-3) and 20 KEGG enrichment terms (Fig. 10c). The top three most significant terms were associated with the nucleus (GO:0005634; p-value = 1.72E−08), nucleoplasm (GO:0005654; p-value = 3.65E−06), and cytoplasm (GO:0005737; p-value = 1.08E−05) terms. In addition, some of these significantly enriched terms were related to morphogenesis and development. Disease-resistance related processes displayed the greatest KEGG enrichments, such as Th1 and Th2 cell differentiation, graft-versus-host disease, Th17 cell differentiation, and intestinal immune network for IgA production. Six enriched terms were related to metabolism. In the comparison group   (Fig. 10d). Of which, eight GO categories were associated with muscle development and growth, including sarcomere, actin filament binding, actin cytoskeleton, actin monomer binding, actin binding, filamentous actin, negative regulation of cell growth, and regulation of growth. Moreover, two GO categories were involved in male reproduction, including spermatogenesis, sperm axoneme assembly and fertilization. The Wnt signaling pathway displayed the greatest KEGG enrichment. The remaining enriched pathways were primarily related to metabolism, secretion and disease (Fig. 10d).

Discussion
It is well known that pig is one of the first domesticated animals. During the process of domestication and breeding, natural and artificial selection caused many changes of pig in body conformation, production performance, immune function, and adaptability to the environment 44 . The distinct imprints have been left in the genomes of domestic pigs. In recent years, selection signatures have been used to elucidate the genetic basis of many complex traits in agricultural animals 5,6 . In the present study, we performed the whole-genome resequencing of 25 unrelated individual Xiang pigs, and conducted population structure and LD analysis for the Xiang pig and other four breeds (i.e., Tibetan, Meishan, Duroc and Yorkshire). Our analyses revealed the population structure was mostly correlated with the geographic distribution. Xiang pigs displayed lower LD decay, indicating that Xiang pigs are less affected by artificial selection than the European and Meishan pig breeds. In addition, we explored the selection signatures that help provide potential genomic evidence linking the domestication of Xiang pigs with their breed characteristics.
To screen genome-wide regions under positive selection in Xiang pigs, we first selected windows with the top 5% of Fst and θπ values and identified 38 genes in the intersection of all four comparisons. Of those, seven genes (ZCCHC2, SLC26A7, SDR9C7, RDH16, TAC3, MYO1A, LYZ, and PDE1A) were involved in important biological processes. Except for PDE1A, other seven selection signatures were also confirmed by ΔAF analysis. ZCCHC2, as a member of the zinc finger CCHC-type (ZCCHC) superfamily protein, has nucleic acid-binding A SNP in ZCCHC2 is associated with insect bite hypersensitivity in Exmoor ponies 46 Besides, ZCCHC2 has previously been identified as important gene contributing to the benign (BEN) phenotypes of SIV/HIV infections, suggesting its role in the host defense against virus infections 47 . Intriguingly, ZCCHC2 showed high Fst and θπ scores in all four comparisons, implicating that ZCCHC2 gene could have been preferentially selected in XP. Our findings supported that ZCCHC2 likely played roles in disease resistance process of XP. SLC26A7 on SSC4 is one of the sulfate/anion transporter gene families and involves in the regulation of intracellular pH through chloride channels. It has been identified as a promising candidate gene for plateau adaptability in Tibetan pigs 48 . Interestingly, we found that four neighbor protein-coding genes (SDR9C7, RDH16, TAC3 and MYO1A) on SSC5  were also under selection in XP. SDR9C7 is a well-recognized gene for skin barrier formation due to its role in dehydrogenation of acylceramides 49 . RDH16, participating in energy and metabolism processes in adipose tissues in pigs and rats, has been shown to be one of the functional genes involved in the body size of Guizhou small goats 50 . Thus, it was reasonable to predict that RDH16 may be a promising candidate gene associated with the body size of the XP. TAC3 encoding neurokinin B (NKB) plays an important role in regulating the human reproductive axis. A previous study has demonstrated that TAC3 is a key regulator of kisspeptin expression and GnRH pulse secretion 51 . A distinctive finding indicate that the variation of A63P in TAC3 may increase the risk of earlier puberty onset in Chinese girls 52 . The results indicated that TAC3 variants in XP, especially one nonsynonymous variants and one functional-altering variant, may contribute to sexual maturity of XP by regulating kisspeptin expression and GnRH secretion. MYO1A (myosin IA) is a candidate gene that lightens the coat color of cattle, which contributes to heat stress resistance in cattle 53 . In humans, variants in the MYO1A gene are reported in non-syndromic sensorineural hearing loss 54 . In the present study, we detected a novel missense variant/splice variant (p.V299I) in MYO1A. The splice variant may affect MYO1A expression. This led us to hypothesize that the p.V299IA splice variant may have a role in local adaptation by regulating coat color and/or hearing in Xiang pigs. LYZ is an antibacterial enzyme which damages bacterial cell walls by catalyzing the hydrolysis of the b-(1,4)-glycosidic linkage between N-acetylglucosamine and the muramic acid of the peptidoglycan layer. LYZ displays antimicrobial, antitumoral and immunomodulatory properties, which have been extensively studied 55,56 . PDE1A are reported to be one of the hypoxia-related genes in Tibetan pig 6 .
To accurately detect the positively selected genes under selection in Xiang pig, we further explored the overlapped selected genes between two or three comparisons datasets. We identified seven genes implicated in growth and development processes (IGF1R, PROP1, TBX19, STAC3, RLF, SELENOM, MSTN). In TBX19, we simultaneously observed strong selective sweep signals (Fig. 9a) and 364 high-ΔAF SNPs in the DU vs XP and LW vs XP comparisons (Table S7). Among the list of these high-ΔAF SNPs, three missense mutations (N105T, M158V, and T189A) were within the evolutionary conserved T-box domain and were predicted as function-altering variants (Fig. 9b). This result was well consistent with that found in Chinese domestic pigs 23 . TBX19 (TPIT) functions by activating pituitary cell differentiation, and is also a positive regulator for pro-opiomelanocortin (POMC) expression, which contributes to the production of the adrenocorticotropic hormone (ACTH) in the hypothalamic-pituitary-adrenal (HPA) axis 57 . Mutations in this gene may lead to the isolated deficiency of pituitary POMC-derived ACTH 57,58 . Patients with ACTH deficiency are clinically characterized by severe obesity, weight loss, lack of appetite, hypoglycaemia and low blood pressure 58 . Previous studies have reported that this gene is related to development and growth in Chinese native pigs, such as Enshi black pig, Bama Xiang 59,60 . Together with these findings, TBX19 may act as a candidate gene associated with slow development and growth trait in XP. Studies have revealed that IGF1R is associated with growth animal growth and development, such as Hulun Buir Sheep 14 and European pigs 4 . PROP1 is a homeodomain transcription factor that controls the development of the anterior pituitary gland. Mutations within this gene caused dwarfism in mice and humans due to deficiency of pituitary hormone 61 . Numerous reports have demonstrated association of PROP1 with growth traits in livestock, such as cattle 20 , sheep 62 , and horse 63 . STAC3 is specifically expressed in tissues of skeletal muscle as a nutrient regulated gene and plays a role in vertebrate skeletal muscle development and function 25 . RLF has been reported to markedly enhance DNA methylation of factors related to transcriptional regulation, which plays an essential role in embryonic muscle development 29 . SELENOM has been identified as a candidate gene for growth traits in the Anqing six-end-white pig (ASP) 36 . MSTN (GDF-8) plays a negative role in regulating the formation and differentiation of muscle cells and subsequently inhibits the growth of skeletal muscles 40 . Mutations or deletions in MSTN can cause the proliferation and hypertrophy of muscle cells and muscle fibers. Interestingly, Bama mini-pigs have a higher level of MSTN mRNA than Landrace pigs 64 . Similarly, XP is also a mini-type breed. However, further studies are needed to confirm the expression level of MSTN in XP. Therefore, these genes detailed above may be the most promising genomic evidence that explain the small body size of Xiang pigs. Xiang pigs inhabit a remote mountainous area in Guizhou Province, where is known for its high elevation and high humidity. During long-term adaptation to the harsh environments, Xiang pigs have evolved their unique adaptive characteristics. By combining the results from Fst and θπ statistical analyses, we identified seven genes that participate in the immune response (SERPINB2, ADGRE5, STAT6, IL2, CD80, RHBDD3, PIK 3IP1) 17,19,24,31,33,34,38 , which may be involved in shaping the particular disease-resistance characteristics of Xiang pigs. Additionally, we also uncovered genes, which may play different roles in adaptation processes of Xiang pigs. These genes include NR2E1 for behavioral defense response 12 , SERPINB8 for maintaining the mechanical stability of skin 15 , SERPINB10 involved in the UV-induced cellular response 16 , UVSSA for response to ultraviolet (UV) radiation 30 , VEGFC responsible for regulating oxygen homeostasis 6 . The positive selection signatures identified here provided new insights into the potential adaptive mechanisms in XP. Apart from these findings associated with growth, immunity, and environmental adaptation, some important functional genes associated with reproduction (CCNB2, TRPM6, EYA3, CYP7B1, LIMK2, RSPO1, ADAM32, SPAG16) 13 www.nature.com/scientificreports/ EYA3, LIMK2 and ADAM32 showed particularly strong signs of selective sweeps presumably associated with the domestication of Xiang pig. EYA3 (eyes absent 3) is an ancient retinal-determining gene, and its expression is induced by long-day stimulation in mammals. It is clear that EYA3 is an essential factor for modulating the expression of the TSHβ, which regulates changes in seasonal reproductive biology 26 . It has been reported that EYA3 is an important candidate gene in male and female seasonal estrus 27 . LIMK2, especially the testis-specific isoform tLIMK2, is essential for the proper progression of spermatogenesis. Additionally, LIMK1/2 has shown to participate in mouse embryo early cleavage and blastocyst formation by regulating the phosphorylation level of cofilin in mouse embryos 65 . In humans, three deleterious nonsynonymous substitutions in LIMK2 are involved in male infertility 37 . When comparing with the European commercial breeds (DU and LW), LIMK2 showed strong signs of selective sweeps in XP. Notably, the derived alleles of 115 SNPs in LIMK2 exhibited high AF (AF = 1) in XPs, whereas these derived alleles displayed low AF (AF < 0.35) in European breeds (DU and LW) (Table S10). ADAM32 has been identified as contributing to fertilization in mouse 39 . Comparedwith the MS, DU and LW breeds, XP exhibited lower prolificacy. Therefore, EYA3, LIMK2 and ADAM32 genes might be promising candidate genes involved in the reproduction traits of Xiang pig. Based on the genome-wide SNP allele frequency (AF) analysis, we identified a suite of promising genes. We further found that some of these genes from ΔAF test colocalized with the selected regions detected in the Fst-θπ analysis (Table S11). These overlapped candidate genes may have undergone strong selection, and potentially affected phenotypic traits of Xiang pig, such as ZZHC2, LIMK2 and TBX19 mentioned above. Apart from the overlapped genes, we further focused specifically on the candidate genes detected only in the ΔAF analysis that contained missense variants. Among these candidate genes, two well-known genes (NR6A1, LTBP2) were of particular significance, as they are both involved in affecting the numbers of vertebrae in swine. The NR6A1 (nuclear receptor subfamily 6, group A, member 1) gene is also known as the germ cell nucleus factor (GCNF). Thus far, a large number of studies have confirmed the association between NR6A1 gene polymorphisms and lumbar vertebrae numbers in pigs and other species 42,66 . In the present study, one nonsynonymous mutation (c.T485C, rs326780270) was identified in the coding region of exon-4 of NR6A1 gene. The derived C allele frequency at this SNP locus was 1.0 in XP, whereas it was 0.0 in the European breeds (DU and LW). LTBP2 is expressed prominently in the outer lamellar layers of the annulus fibrosus of the fetal intervertebral disk in humans 67 . LTBP2 regulates the activity of growth differentiation factor Gdf11 by inhibiting the extracellular processing of proGdf11. Gdf11 knock-out mice exhibit an increase in the numbers of ribs from 13 (wild type) to 18 68 . SNP (c.4481A > C) in LTBP2 is associated with thoracic vertebrae numbers in an F2 intercross population between Landrace and Korean native pigs 43 . In our study, we observed one nonsynonymous mutation (c.T2732C, rs328662847) in the coding region of exon-17 of LTBP2 gene. The derived C allele frequency at this SNP locus was 1.0 in XP, whereas it was 0.0 in the European breeds (DU and LW). Given the importance of NR6A1 and LTBP2 to vertebrae numbers, the two SNPs: rs326780270: T > C in NR6A1 and rs328662847: T > C, may affect vertebrae development in XP.
For each comparison, we subsequently investigated the functions associated with the annotated genes from both Fst and θπ statistical approaches by analyzing over-represented GO term and KEGG pathway analysis. In the TT vs XP comparison, the most significant pathway was "Cellular senescence" (Fig. 10a), which is often regarded as an anti-cancer mechanism 69 . In the MS vs XP comparison, the KEGG analysis revealed that "pathways in cancer" represented the greatest enrichment. We also detected calcium signaling pathway, which plays a crucial role in muscle function and plasticity and is involved in many processes during animal embryonic development. Of note, 15 significant GO terms related to the immune system in XP were identified, such as the adaptive immune response, response to virus, toll-like receptor 2 signaling pathway, and interleukin-2 receptor binding (Table S13-2). Other GO terms of interest were related to adaptability, including aggresomal, locomotor behavior, cellular response to heat, cellular response to UV. The results were consistent with the fact that XP exhibits strong disease resistance and high adaptability. In the DU vs XP comparison, the results of KEGG enrichment analysis (Fig. 10c) revealed that eight pathways involved in the immune system or infectious diseases were significantly enriched, including Th1 and Th2 cell differentiation, graft-versus-host disease, allograft rejection, Th17 cell differentiation, intestinal immune network for IgA production, human immunodeficiency virus 1 infection, autoimmune thyroid disease, and human T-cell leukemia virus 1 infection. This may provide a better mechanism for the disease resistance in XP. Compared with Duroc pigs, Xiang pigs are smaller and not as tall. We identified seven significant pathways that were related to developmental and metabolic processes including Wnt signaling pathway, histidine metabolism, inositol phosphate metabolism, metabolic pathways, beta-alanine metabolism, ether lipid metabolism, and arginine and proline metabolism. Among the significant enriched GO categories (Table S13-3), eight terms (GO:0060612 GO:0048536 GO:0001889 GO:0003143 GO:0061303 GO:0055013 GO:0060430 GO:0060322) were associated with development processes. These results indicated that differences in development and metabolism between XPs and DUs. In the LW vs XP comparison, the most significant pathway was "Wnt signaling pathway" (Fig. 10d). Wnt signaling pathway participates in many developmental events during embryogenesis and is essential for muscle fiber growth and maintenance 58,70 . Wnt signaling pathway is also involved in satellite cell proliferation and differentiation during adult skeletal muscle regeneration 71 . We also found significantly over-represented GO terms associated with male reproduction (GO:0007283 GO:0007288 GO:0009566) and growth (GO:0030308 GO:0040008) (Table S13-4). Our findings reflect that the candidate genes are mainly involved in disease resistance, adaptability, and developmental and metabolic processes in XPs. The largest number of terms related to disease resistance and adaptation, which further indicate that Xiang pigs underwent natural selection pressure to adapt to their environments.

Conclusions
We conducted full resequencing of 25 Xiang pigs individually, resulting in a comprehensive whole-genome map and the identification of 21 M heritable SNPs. Moreover, we detected the genomic signatures and found that the genes positively selected in Xiang pigs were involved in crucial biological processes. We highlighted some genes/ regions under possible selection relevant to disease resistance (ZCCHC2), early maturation (TAC3), reproduction (LIMK2, EYA3, ADAM32) and growth (TBX19). We mainly exemplified striking evidence of selection at ZCCHC2 and TBX19, which are correlated with diseases, and growth and early development, respectively. Additionally, important missense mutations with high ΔAF were identified within NR6A1 and LTBP2 genes that mainly contribute to vertebrae numbers of Xiang pigs. These findings will facilitate the understanding of the germplasm characteristics and support further investigation of the mechanisms underlying selection in XP pig breed.

Materials and methods
Ethics statement. All animal procedures were approved by Guizhou University Subcommittee of Experimental Animal Ethics (EAE-GIU-2021-P009) and were conducted the rules of animal experimental ethics. The study is also in accordance with ARRIVE guidelines.
Sample collection and sequencing. In

Mapping, SNP calling and annotation.
To avoid low-quality reads, the raw paired-end reads were filtered and trimmed using the NGSQC Toolkit with default parameters 16 . High-quality reads were then aligned to the Sscrofa11.1 reference sequence using the Burrows-Wheeler Aligner (BWA) with the "bwa-mem" algorithms 72 . SAMtools was used for sorting and indexing the aligned BAM files 73 . Potential duplicates were removed using Picard tools. Finally, we jointly used "HaplotypeCaller", "CombineGVCFs" and "GenotypeGVCFs" with default parameters in GATK to call variants, which generated genotype calls in Variant Call Formats (VCF) 74  Population structure and linkage disequilibrium. To better understand the genetic relationships among the four pig populations in our study, we performed PCA, NJ tree and ADMIXTURE analysis. Principal component analysis (PCA) was carried out using the GCTA software 77 , and the first two eigenvectors were plotted in the ggplot2 package under the R platform. For phylogenetic tree analysis, PLINK was used to calculate genetic distance matrix 78 . Next, the phylogenetic tree was constructed with the neighbor joining (NJ) algorithm in MEGA 79 and displayed with FigTree 80 . Moreover, the population ancestry was estimated using ADMIXTURE software 81 with K set from 2 to 5. The genome-wide linkage disequilibrium (LD) decay between pairwise SNPs was assessed and visualized using PopLDdecay software 82 . Selective sweeps and functional enrichment analyses. Before detecting selection signals, we filtered the SNPs with minor allele frequency (MAF) < 0.05 and call rates < 0.90 and excluded sites with a missing rate > 20% using PLINK. In order to identify the candidate regions under positive selection in XPs, we first calculated the fixation statistics (Fst ) and population nucleotide diversity ratio (θπ) according to the the procedure described by Li et al. 6 . Both the Fst and θπ statistics require the usage of the standard genotype data. The Fst value was estimated based on the differences in allelic frequencies between populations 83 . This method is effective for identifying the genomic pattern affected by the divergent selection 84 . Domestication and selection resulted in decreased nucleotide diversity in populations. Thus, the method based on θπ can serve as efficient statistic to identify signatures of selection in populations 6,74 . In fact, numerous studies have been performed to detect the selective signal sweep regions in animals by using both Fst and θπ methods 6,75,85 . The average Fst and θπ were calculated by VCFtools (v0.1.13) in 100-kb sliding windows with a 10-kb step size 6,85 between XP and the other four control breeds. To avoid spurious selection signals, windows containing less than 10 SNPs were discarded. Putative selection targets were identified as the candidate regions in fully overlapping windows with high Fst (Fst > 95%) and θπ (θπ > 95%) values 85 . The selected regions were annotated using Bedtools (v2.17.0).
The highly differentiated SNPs are likely to be directly targeted by selection or to occur near loci under selection 86 . Therefore, we also estimated allele frequency of single SNP with a genome scan for each pig population and measured the absolute allele frequency difference (ΔAF) to further detect putative selection signatures 5 . The ΔAF method is best suited for detection of fixed or approximately fixed alleles in Xiang pigs 87 . The ΔAF value per SNP between the Xiang population and the other four pig populations was calculated using the formulas: ΔAF = abs (AltAF XP − AltAF TT ), ΔAF = abs (AltAF XP − AltAF MS ), ΔAF = abs (AltAF XP − AltAF DU ), and ΔAF = abs (AltAF XP − AltAF LW ), respectively. After comparisons of ΔAF value between Xiang and each of the other four Scientific Reports | (2022) 12:11823 | https://doi.org/10.1038/s41598-022-14686-w www.nature.com/scientificreports/ groups, we searched for Xiang-specific SNPs across the entire genome with the criteria of AF Xiang > 80% and AF non-Xiang < 20% 88 . By appliying the absolute allele frequency difference (ΔAF) analysis, we could not only furthe evaluate the reliability of the results from Fst and θπ statistics, but also mine the potational selected gene containg interesting candidate mutations.
To investigate the biological enrichment of genes under selective pressure, KEGG pathway and GO classes were analyzed based on the candidated genes from Fst and θπ methods by using KOBAS (v3.0) 89 . The terms with p-value smaller than 0.05 were considered to be statistically significant.